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ABSTRACT 



C^ I In a previous paper ( [Varadi et al. 1999|) , Random Lag Singular Spectrum 

Analysis was offered as a tool to find oscillations in very noisy and long time 



series. This work presents a generalization of the technique to search for 

■^ ' common oscillations in two or more time series. 

(N' 
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^ ■ Subject headings: Sun: oscillations — methods: data analysis 
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^' 1. INTRODUCTION 

O 

^, 

c/3 . This paper follows up on our previous one ([Varadi et al. 1999| ) in which Random Lag 



Singular Spectrum Analysis was described at length. The technique is being used to search 
for low-frequency solar acoustic and gravity oscillations in the Global Oscillations at Low 
Frequency (GOLF; [Gabriel 1998| ) and Michelson Doppler Imager (MDI; ^cherrer 1998[) 



data. It has became apparent, however, that identifying common, simultaneous oscillatory 
components in these data is a more promising approach. Hence the need arose to generalize 
the technique for two or more time series which is described here briefly. 



Singular Spectrum Analysis (SSA; [Vautard fc Ghil 1989[ ; [Vautard et al. 1992| [Dettinger 



et al. 1995[) was originally developed to search for oscillation in short and noisy time series 



that one typically encounters in geophysics. The technique computes the eigenvectors of 
autocorrelation matrices. The sizes of the latter are usually a third of the length of the 
time series which limits the feasibility of the method to time series no longer than a few 
thousand. It has been employed in the analysis of GOLF and MDI data by extracting the 
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signal in narrow frequency bands which reduces the length of the time series to which SSA 
is apphed ( [Ulrich et al. 199^ ; |Varadi et al. 19981) . This, however, made it difiicuh to assess 
the importance of candidate modes in the signal as a whole. Our previous paper on Random 
Lag Singular Spectrum Analysis ( |Varadi et al. 19lJD| ) explored connections between linear 
dynamical systems and SSA. It was shown that one can work with sequences of random 
lags when dealing with matrices of autocovariances, which are the mainstay of SSA and 
numerous other techniques such as autoregressive modeling (AR) (e.g., Percival fc Walden 



|1993| ; [Proakis fc Manolakis 199^ . This approach makes it possible to carry out SSA in wide 
frequency bands, on time series having tens of thousands of points. Here a generalization 
of the technique. Random Lag Singular Cross-Spectrum Analysis (RLSCSA), is introduced 
through a few equations, without exploring deep mathematical issues. Preliminary results 
obtained by RLSCSA are quite encouraging and it seems appropriate to make the technique 
accessible to the community expeditiously. 



2. RANDOM LAG SINGULAR CROSS-SPECTRUM ANALYSIS 



The technique searches for coincident patterns of oscillations in two given time series, 
X = Xi,X2, ■ . . ,xiy and y = yi,y2, ■ ■ ■ iVn with the same uniform sampling in time. Random 
Lag Singular Spectrum Analysis ([Varadi et al. 199^ ) does so in the case of a single time 
series, i.e, x = y, hy looking at possible linear relationships between its lagged copies. The 
reasoning relies on straightforward but lengthy linear algebra. In short, a linear system 
produces time series in which consecutive values are linear related, at least approximately. 
This is analogous to the point of view of AR models ( Percival fc Walden 1993]) but it is 
more general. The most important difference, however, is that SSA and its generalizations 
do not try to fit an AR model to very noisy data but rather try to extract what appears to 
be signal from a noisy background. 

Both time series are assumed to have zero mean. First two matrices are formed, one of 
which is 



D, 



Xj-l+ki 


Xj-l + k2 ■ 


■ ^j-l+kM 


Xj+ki 


^j+k2 


Xj+kM 


Xj+l+ki 


Xj+l+k2 ■ 


■ ^j+l+kM 



:2-i) 



whose columns are index-shifted (lagged) copies of the time series x. Here the lags 
ki,k2, . . . , kM are all different, random positive integers uniformly distributed between 1 
and a maximum lag K. With some other set of random lags, the analogous matrix D^ is 



-3- 



forined for y. The two sets of lags, for x and y, may or may not contain common elements. 
For the sake of exposition, the data matrices D^. and D^ are padded with zeros for those 
indeces of x and y for which no value is available, i.e., when the index is smaller than one 
or larger than A^. In practice, these data matrices are not used directly. Then the M x M 
matrix 

C = D^D, (2-2) 

is formed which consists of covariances at various lags, except for normalization factors (^ 
denotes transpose). 

The essence of the method is to compute the Singular Value Decomposition (SVD; 
Golub fc Van Loan 1996|) of this matrix, i.e.. 



C = D^D, = E^AEj^, (2-3) 

where E^^. and Ey are orthogonal matrices. The matrix A is diagonal and contains the 
so-called singular values. From (^^1) i^ follows that 

(D,E,.)^ CDy^y) = A. (2-4) 

The columns of the N x M matrix D^-Ea, represent filtered versions of the original time 
series x, while D^Ey does the same for y. Clearly, the singular values can be interpreted as 
covariances between these filtered time series. One can also observe that the zth column of 
J^x^x has nonzero covariance only with the zth column of DyEy. 

Next the columns of D^E^^ are modeled by the columns of D^E^. using ordinary linear 
regression, the zth column in the former by the ith column in the latter. For this, the 
variances of the columns in both matrices have to be computed. In the case of x, they are 
the diagonal elements of the M x M matrix 

{BxExf (BxEx) = El {b^Bx) Ex (2-5) 

and an analogous formula applies in the case of y. Since the original x and y have zero 
means, the same is true for the columns of BxEx and D^E^^, at least approximately in the 
case of large A^ and no trends in the data. Hence one needs to compute only the scaling 
coefficients in the linear regression models between these columns. These can be collected 
to form a diagonal matrix B^. to obtain 

(D^^E,) = D,E,B,, (2-6) 

where ^ signifies that this is a statistical model. Next one defines the model for the columns 
of By as 

D; = D,E,B,Ej^. (2-7) 
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Finally, we have to create a new time series y from Dj,. Each column of the latter is 
index-shifted relative to the original indexing of y and thus one could use any column as y. 
Alternatively, one can simply unshift each column and average them to obtain 

1 M M M 

^" = M E E E ^n+(.f-.|) {BD^ {e:\ (ei) . , (2-8) 

^*^ s=l i=l j=l •' 

where kf is the ith lag for x, k^ is the jth lag for y, s designates the sth column of a matrix 
and the subscripts i and j are row indices. 

The last equation describes how to model oscillations in y with those in x by 
moving-average or finite impulse response filters. Very roughly, the oscillations in x are 
represented by E^. and those in y hj Ey, while B^^ contains scaling factors. At this point 
one can separate signal and noise by not taking into account all the linear regression models 
between x and y. When the cross-correlation is large for some component s, it is more likely 
that the same oscillation is present in both time series. Hence one would not sum for all 
s = 1, . . . M but for only some of them. Once the filter has been determined, perhaps the 
best way to proceed is by computing its spectral response i.e., its z transform which boils 
down to computing its Fourier transform ( Proakis fc Manolakis 1996| ). This is analogous to 



the Maximum Entropy Method (|Percival fc Walden 1993|). 



When X and y are the same and the same lags are used, the filtering formula above 
is the same as in Random Lag Singular Spectrum Analysis with B^ being identity matrix 
( IVaradi et al. 1999| ). Therefore, RLSCSA is a direct generalization of Random Lag Singular 
Spectrum Analysis. Furthermore, as in the case of the latter, in RLSCSA the analysis can 
be carried out for several different lag sequences and the filters can be averaged for these 
cases. Also, a larger number of lags, M, always provides better results. 

There can be a number of variations on the basic construction above. For instance, in 
the case oi x = y one can use lag sequences k^ and k^ which do not have common elements. 
This doubles the number of autocovariances one would include in the matrix C as compared 
to Random Lag Singular Spectrum Analysis for the same number of lags M. In the case of 
X ^ y, one could use the same lag sequence (/c^ = k^) or different ones. When several time 
series are given, divided into two groups, x^^\x^'^\ . . . , x^^^^ and y^^\ y^'^\ . . . , y^^y\ one can 
form the data matrix 

[D^(i),D^(2), . . .D^(d,)] (2-9) 

and an analogous one for the other set of time series. In these matrices the columns 
come from possibly different time series. The formulae above needs only straightforward 
modifications for this case although indexing becomes somewhat complicated. In (|2-8|) , one 
would have filters operating on the x signals whose sum would model oscillations in yk) for 

t i, /, . . . , Uy. 
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There can be phase lags between x and y. There should be none, however, when they 
are simultaneous and independent measurements of the same physical quantity. In such 
cases, one can use the phase information in the filter (^^1) ^° further separate signal and 
noise. If E^; and E^ operate on the same signal, then the phases obtained should be close 
to each other. Hence one would consider as signal only those peaks in the spectral response 
of the filter which have nearly zero phase. Such a "phase weeding" seems to provide better 
results but further work is needed to fully develop this idea. 



3. IMPLEMENTATION 

For clarity, we provide a description of how the computations are done in practice. 
Instead of products of lagged time series, estimates of covariances are used to create the 
matrix C. First of all, we subtract from x its mean to make sure that it is zero and also 
devide the resulting x by its standard deviation to ensure that the numerical values are 
in a reasonable range. The same is done for y. Next one has to determine cross- and 
autocovariances which, in turn, are computed by fast convolution algorithms (e.g, |Percival] 



fc Walden 1993| ; [Proakis fc Manolakis 1996| |Varadi et al. 199"9| ). For that, the quantities 



^3 = J2 ^kVk+j, j = -{K - 1), . . . , K - 1 (3-1) 

k 

are computed the following way. If y denotes the reverse of y, i.e., 

yi = yN-i+i, (3-2) 

and the number N2 is at least 2N — 1, then 



1 

No 



q= —BFT],l (DFT^,(a;) BFT^M) ■ (3-3) 



Here DFTatj (x) denotes the complex Fourier transform on A'2 points without normalization, 
i.e., without dividing the transform with N2, in either the direct or inverse transform. In 
formula, 

Ar2-1 N2~l 

DFT^,(z)fc = Y^ 2,■e-2-^■^/^^ DFT^^^(^), = Y. Zkc'^^'/'^^. (3-4) 

j=0 k=0 

Both X and y are first index-shifted to start with zero index and then are padded by zeros 
to length N2. The latter ensures that the otherwise cyclic (or circular) convolution obtained 
by ( p-3| ) is in fact acyclic (e.g., [Proakis fc Manolakis 19961) . After computing the right hand 



side of (|3-3|) , the result has to be index-shifted by N in the negative direction to obtain 
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the values of q with the correct indexing since the A^th element on the right hand side is 
actually %. Next we compute 

'^^■ = iV^^^' 3 = -{K-l),...,K-l, (3-5) 

which are unbiased estimates of the covariances — which can proved along the same lines 
as for autocorrelations ( [Froakis fc Manolakis 19^BD . 



Next the M-long sequences of random lags are selected, k^ and A;^, using a random 
number generator. Having the latter, one can form the matrix C, more exactly its multiple 
by some factor. One has 

Cij = qk^-kf- (3-6) 

The SVD of C is computed the following way but there are other methods ( |Golub fc Van] 



Loan 1996). First one computes the symmetric matrix 



C^C = ByA^E^ (3-7) 

whose eigenvectors are the columns of E^^. Any standard high-performance algorithm for 
the symmetric eigenvalue problem (|Golub fc Van Loan 1996|) can be used to compute Ej^. 
For the matrix E^,. one has 

CE, = E,A, (3-8) 

which means that E^^ can be computed by normalizing the columns of the matrix on the 
left hand side. The normalizing factors are the singular values which are all nonnegative if 
these formulae are used. One has to be careful when very small singular values are present 
but this did not occur in the cases we dealt with so far. 



The autocovariances in ( |2-5| ) are computed as above with x replacing y. The diagonal 
matrix of scaling coefficients, B^^^, is determined by the standard formula 

Bk = ^ (3-9) 

^k 

where the index k refers to the k element in the diagonals and a^ is the kth element in the 
diagonal of ( P-5|) . Finally, the filters are computed using ( |2-8| ) . In this, one has to make 
choices which components should be included, i.e., for which values of s should the sum be 
restricted. Obviously, one should compute the correlation coefficients 

r. = ^ (3-10) 

where the superscript of a is used to distinguish between the standard deviations of the 
kth columns of J^x^x and JDyEy. Then the filter coefficients in ( P-^ ) are collected for 
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those s for which r^ are the largest. The computational steps starting with the selection 
of random lags are repeated for a number of different selections of them and the resulting 
filter coefficients are averaged, as in the case of Random Lag Singular Spectrum Analysis 
( IVaradi et ai. T^ ). 



4. AN EXAMPLE 

It is worthwhile to present a "proof of concept" example. A second-order autoregressive 
model, i.e., a discrete, damped oscillator, was used to generate a signal by forcing the 
model with white noise. The signal has 50000 points from which two subsequences were 
selected, each 5000 long, separated from each other by 40000. In the top panel of Fig. |l], the 
Fourier spectrum of one of the subsequences is shown. Then white noise was added to both 
subsequences. As can be seen in the middle panel of Fig. |l], the cross-spectrum amplitudes 
of the two noisy subsequences reveal nothing about original signal. The bottom panel shows 
the spectral response of the RLSCSA filters. Here 500 lags were used, the maximum lag 
being 2500, and the filters in ( p-^ ) were averaged for 100 different sets of random lags. The 
components with largest two correlations were included in the filter for each selection of 
lags. The signal is recovered as the third largest peak in the spectral response which we find 
quite encouraging. This example also illustrates that one should expect to see a number 
of noise peaks beside signal peaks. Still, one has to deal with a few peaks in the RLSCSA 
results, while the cross-spectrum exhibits hundreds of them. 

We thank M. Dettinger, I. Fodor, M. Ghil, J. Leibacher, J. Pap, C Tatsuoka and P. 
Yiou for enlightening discussions on time series analysis and statistics. SOHO is a project 
of international cooperation between ESA and NASA. This research is supported by a 
NASA subcontract to UCLA through Stanford University. The partial financial support of 
NSF Grant AST96-19574 and NASA Grant NAG5-6680 (F.V.) is also acknowledged. 
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Fig. 1. — Top panel: Fourier spectrum of one subsequence of the signal. Middle panel: 
Amplitude of the cross-spectrum of two subsequences of the noisy signal. The original signal 
is completely hidden in noise. Bottom panel: RLSCSA filter response. The third largest 
peak is the original signal. 



